Conversation
Integrate llama.cpp via Go bindings for local embedding generation. Add sqlite-vec for vector storage and similarity search. Include schema migrations, daemon API changes, and proto updates.
…build - Fix sqlite-vec compilation on Alpine/musl by guarding BSD type aliases with __GLIBC__ - Dockerfile: switch to CPU-only llama.cpp build (Vulkan shaders fail on Alpine) - Dockerfile: add llama-go go.mod copy for replace directive support - CI workflows: add GGUF model caching and download steps - CI workflows: add llama.cpp build steps (CPU-only for tests, GPU for desktop releases) - CI workflows: add LIBRARY_PATH/C_INCLUDE_PATH env vars for CGO linking - ci-setup action: add Vulkan SDK and llama.cpp build per platform
Replace the vendored backend/util/llama-go directory (~1200 C/C++ files, 500K+ lines) with a git submodule pointing to seed-hypermedia/llama-go. Changes: - Remove vendored llama-go and add as git submodule - Fix go.mod: use upstream tcpipuk/llama-go module path with replace directive pointing to ./backend/util/llama-go - Update import in llamacpp.go to use upstream module path - Add submodule init guard to .envrc (before mise activation) - Add submodule existence check to mise.toml ensure-llama-libs task - Remove sync_llama_go() and generate_gpu_build_files() from ./dev script - Add submodules: recursive to 12 CI checkout steps across 10 workflows - Fix wrapper.cpp in fork: use common_chat_parser_params matching pinned llama.cpp version (commit 2eee6c866)
- Use HTTPS URL in .gitmodules so cloning works without SSH keys - Add ensure-submodule mise task to auto-init submodules - Make ensure-llama-libs depend on ensure-submodule - Move setup orchestration from mise enter hook (unreliable with direnv) to explicit mise run calls in .envrc - Result: git clone + cd into repo does everything automatically
The llama-go submodule includes the full llama.cpp source tree (~2500 files, 148MB). The previous glob copied all of them into the Please sandbox temp dir before building, causing massive disk I/O and memory pressure that could freeze the machine. Build in $WORKSPACE in-place (like seed-daemon already does) and copy only the ~9 output .a files back to the sandbox. The Makefile is kept as a src for change tracking.
Eliminate the SEED_CPU_ONLY / SEED_USE_GPU toggle that caused build conflicts when ensure-llama-libs (CPU-only) and plz (GPU) built into the same directory with different modes. Now each platform always uses the same GPU mode everywhere: - macOS: always Metal (built-in, zero deps) - Linux: always CPU-only for local dev (no Vulkan packages needed) - CI: handles per-platform GPU builds in ci-setup/action.yml Changes: - mise.toml: ensure-llama-libs detects OS and builds Metal on macOS, CPU-only on Linux. Detects stale CPU builds on macOS via missing libggml-blas.a and forces rebuild. - backend/BUILD.plz: llama-cpp and seed-daemon genrules use OS detection instead of SEED_CPU_ONLY env var. - dev: remove setup_gpu_build(), --cpu/--gpu flags from all commands. - .plzconfig: remove SEED_USE_GPU/SEED_CPU_ONLY PassUnsafeEnv. - Fork Makefile: add Metal mismatch detection alongside existing Vulkan detection in CMake cache checks.
The seed-daemon genrule's glob(**/*.c, **/*.h, **/*.cpp, **/*.hpp) captures ~2500 files from the llama.cpp nested submodule. Please hashes and copies all of them into the sandbox, causing 10+ minute builds and extreme CPU/memory usage. Exclude util/llama-go/llama.cpp/** since the seed-daemon genrule only needs the compiled .a libraries (via :llama-cpp dependency), not the C/C++ source files.
plz build takes 10+ minutes for seed-daemon due to sandbox overhead (copying files, hashing dependencies). go build directly takes ~12s from cold cache, ~3s incremental. Replace plz build //backend:seed-daemon with direct go build in all ./dev commands (build-desktop, test-desktop, run-backend, build-backend). The BUILD.plz genrule is still used by CI workflows. Also fix build-backend which still referenced the removed setup_gpu_build() function.
This reverts commit 34208d7.
…ove dead setup_gpu_build from dev script
The test waited for embedCalls==2 then immediately checked the DB, but the INSERT transaction could still be in-flight. Now also waits for runOnce to fully complete (task deleted from taskMgr) before checking DB state.
3c35dda to
8e6ba0e
Compare
… and remove test-gpu-build - Add GGUF model cache + download steps to dev-desktop.yml (already in release-desktop.yml) - Add Windows DLL verification step to both dev-desktop.yml and release-desktop.yml - Delete test-gpu-build.yml as all its steps are now in the real workflows
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is identical to #152 in functionality but it uses submodules instead of raw source files to not pollute the workspace with third party code. All the submodules initialization is handled by direnv so it transparent for the Developer.